Nature Biotechnology
○ Springer Science and Business Media LLC
Preprints posted in the last 7 days, ranked by how well they match Nature Biotechnology's content profile, based on 147 papers previously published here. The average preprint has a 0.35% match score for this journal, so anything above that is already an above-average fit.
Hu, S.; Cheng, H.; Gillenwater, L.; Manpearl, K.; Mandava, A.; Wang, Y.; Pividori, M.; Stranger, B.; Krishnan, A.; Greene, C.; Gao, Y.
Show abstract
Objective. Biomedical knowledge graphs (KGs) such as PrimeKG, Hetionet, UMLS, and PharmGKB are increasingly used as the substrate for downstream machine-learning, retrieval-augmented generation, drug-repurposing, and electronic health record (EHR) augmentation pipelines. The dominant assumption in published work is that integrating two or more such KGs is a tractable engineering step solved by identifier (ID) matching. This paper interrogates that assumption empirically. We quantify how much concept overlap survives realistic alignment, and we characterize the new failure modes introduced by the methods that practitioners reach for when ID matching is insufficient. Materials and Methods. We compared four widely used biomedical KGs (PrimeKG, Hetionet v1.0, the full UMLS Metathesaurus, and PharmGKB) across eleven node types using a tiered alignment pipeline: (1) direct ID matching for nodes sharing a primary vocabulary; (2) cross-ontology bridging using standard mappings (e.g., MONDO-DOID, HPO-UMLS, HPO-UMLS-MeSH for side effects, NCBI Gene-HGNC-UMLS, UBERON-FMA/SNOMEDCT_US/NCI/MeSH for anatomy); (3) ClinicalBERT cosine-similarity grouping at threshold >= 0.98 for over-segmented disease nodes, with a deterministic suffix-stripping canonicalizer; (4) exact name matching for ontology-poor types (anatomy, REACTOME pathways); and (5) embedding-based fuzzy matching with UMLS lookup (SapBERT and ClinicalBERT) for free-text microbiome concepts. We applied the pipeline to a 698-concept gut-microbiome benchmark spanning taxa, pathways, and disease labels, validated grouping decisions against the curated SSSOM mappings released by the MONDO project, and audited the ClinicalBERT consolidation against five clinical-genetics case studies drawn from the literature. Results. Per-type pairwise coverage was strikingly asymmetric. Genes/proteins and the three Gene Ontology categories aligned cleanly across PrimeKG and Hetionet (mutual coverage 94-99%), but disease overlap was sparse: only 0.7% of PrimeKG individual disease nodes mapped to Hetionet, rising to 2.0% after MONDO grouping (versus 78.7% and 18.4% from the Hetionet side). PrimeKG-to-UMLS coverage spanned 100% (effect/phenotype via HPO) down to 20.8% (REACTOME pathways), with drugs at 73.7% and anatomy at 58.8%. PrimeKG-to-PharmGKB drug coverage required up to two bridging hops (DrugBank -> UMLS -> RxNorm/ATC/MeSH). Bigger was not uniformly more complete: on a 698-concept microbiome drug benchmark, Hetionet missed 0 concepts while PrimeKG missed 16. ClinicalBERT-based grouping consolidated 22,205 raw MONDO disease nodes into 17,080 groups but introduced three reproducible failure modes documented in case studies: (i) peer over-merging: for example, all 22 osteogenesis imperfecta subtypes collapsed into a single node despite distinct severity classes; (ii) parent-child collapse: e.g. acute myeloid leukemia merged with myeloid leukemia, erasing the acute/chronic distinction that drives clinical management; and (iii) lexical false positives: neurofibromatosis and schwannomatosis grouped together despite cellular-pathology differences. Discussion. Identifier matching alone is a weak baseline for biomedical KG integration. Cross-ontology bridges and embedding-based consolidation expand coverage but do so at the cost of clinically meaningful resolution, and the resulting failures are systematic rather than random. Reporting only aggregate coverage statistics obscures these losses, which propagate silently into downstream tasks. Conclusion. We provide reusable per-type coverage tables, a taxonomy of three integration failure modes, and concrete recommendations for downstream studies that depend on a unified biomedical KG. We argue that future KG integration work should report per-type coverage and per-cluster confidence rather than aggregate match rates.
Jacobs, L. A.
Show abstract
COVID-19 risk scores developed during the pandemic relied on measurements contemporaneous with infection, leaving unresolved whether the metabolic and inflammatory vulnerability they capture pre-existed as a stable trait or was triggered by acute illness. Here, using 501,946 UK Biobank participants whose blood was drawn between 2006 and 2010---at least ten years before SARS-CoV-2 emerged---we show that baseline proteomic and metabolic profiles predict both COVID-19 hospitalization (2,783 events; C-statistic =0.676 [0.666--0.686]) and COVID-19 mortality (1,564 deaths; C-statistic =0.730 [0.701--0.760]) from parsimonious, regularized feature sets. The IL-1 pathway index (xIL1, +0.093) was independently selected for hospitalization but not mortality, while the IL-6 trans-signaling index (xIL6, + 0.040) was selected for mortality but not hospitalization---a differential pathway weighting corroborated by independent LightGBM/SHAP analysis and mirroring the subsequent success of tocilizumab (anti-IL-6R) and the limited efficacy of anakinra (anti-IL-1R) in reducing COVID-19 mortality in randomized trials conducted years later. The mortality model was additionally characterized by central adiposity (waist-hip ratio, +0.386), a respiratory compromise index (xRSP, +0.149), and prodromal cardiovascular disease (pCVD, +0.246). These findings establish that vulnerability to a novel pathogen is, in substantial part, a pre-existing and measurable prodromal state, with implications for pandemic preparedness and population-level risk stratification.
Zhang, C.; Chen, Y.-L.; Jamilov, A.; Liu, E.; Shree, S.; Lam, B. D.; Foy, B. H.
Show abstract
Most routine clinical markers are interpreted using population-based reference intervals, despite being regulated around patient-specific homeostatic setpoints. This mismatch obscures physiologic shifts, inhibiting detection of early disease signatures. Here, we develop a novel Bayesian inference method that adaptively constructs personalized reference intervals using each patients existing health records. In analysis of >100 million lab tests in >800,000 patients, these personalized intervals can be accurately constructed with only minimal prior data, meaning this method can be applied near universally. We show that across 43 common lab markers, patient setpoints are strongly associated with future morbidity, with signal strength increasing as more test data is collected. Deviation from personalized reference intervals provides strong and novel risk signatures across diverse disease states, including hypothyroidism, hematologic cancers, kidney disease, and pregnancy complications. Importantly, personalized reference intervals capture a different risk signature to existing population-based approaches, with the highest risk patients being those who deviate from both intervals simultaneously. In a targeted clinical use case study of iron infusion, use of personalized reference intervals greatly improved prediction of treatment efficacy and allowed precise tracking of treatment responses. Our results illustrate how existing health records can be used to construct personalized benchmarks for nearly all common clinical tests, driving a new paradigm for precision laboratory medicine.
Mao, Y.; Lopman, B.; Koelle, K.; Lau, M. S.
Show abstract
Accurate forecasting of seasonal influenza is critical for public health preparedness, and data-driven models are central to this effort. However, most approaches rely on aggregate indicators of influenza-like-illness (ILI), which can obscure heterogeneity and limit predictability at longer horizons. While subtype dynamics are well established, their role in data-driven forecasting remains incompletely understood. Here, we integrate subtype-resolved surveillance data into diverse data-driven frameworks using over a decade of U.S. surveillance records to evaluate and decompose predictive signal in influenza forecasting. Across pre- and post-COVID-19 periods, subtype-informed models consistently improve over baseline models trained on aggregate ILI alone, with the largest gains at longer horizons. Decomposition reveals a horizon-dependent reorganization of predictability: autoregressive persistence in recent aggregate incidence dominates at short horizons but declines with lead time, while predictive signal shifts toward subtype-derived structure. Within this structure, interaction-related features among co-circulating subtypes grow systematically with forecast horizon, indicating that longer-term predictability is driven increasingly by interaction structure rather than marginal subtype composition alone. Together, our results show that subtype information provides non-redundant predictive signal and extends the effective forecasting window of data-driven models. More broadly, our findings suggest that aggregation of heterogeneous subtype processes can obscure latent predictability, supporting subtype-resolved surveillance.
Nag, S.; Banerjee, S.; Banerjee, S.; Ghosh, S.; Bera, A.; Shanmugam, S.; Mondal, A.; Chakraborty, S.
Show abstract
Tuberculosis (TB) remains one of the deadliest infectious diseases, with over a million deaths annually and a growing threat from multidrug-resistant strains (MDR-TB). A major bottleneck in controlling TB is the lack of truly portable, rapid, and user-friendly diagnostic systems that can operate effectively in decentralized, resource-constrained settings. Here, we present a first-of-its-kind, portable nucleic-acid-based diagnostic platform that enables both primary TB screening and detection of drug resistance within the same unified framework, without any change in the operative embodiment. The system integrates loop-mediated isothermal amplification (LAMP) targeting dual Mycobacterium tuberculosis markers (IS6110 and IS1081) with a compact, AI-enabled device and smartphone-based readout, delivering rapid and reliable results at the point-of-care. Clinical evaluation across 105 samples demonstrated high sensitivity and specificity. Further validation through real-world deployment in a primary healthcare setting, using a single-gene (IS6110) configuration operated by minimally trained personnel, yielded 95.60% sensitivity and 100% specificity, benchmarked against GeneXpert. Critically, the same platform architecture, without modification, extends seamlessly to drug-resistance profiling, demonstrated here through a probe-free, allele-specific LAMP approach for identifying key mutations associated with rifampicin (rpoB) and isoniazid (katG) resistance. By combining robust molecular diagnostics with AI-driven automation in a compact and accessible format, this work represents a significant medical advancement toward democratizing TB care. The platform thus holds strong potential to enable early screening, guide timely treatment decisions, reduce transmission, and substantially strengthen global TB elimination efforts, particularly in high-burden, low-resource settings.
Cavon, J.; Perez, C.; Quinn-Bohmann, N.; Magis, A. T.; Gibbons, S. M.
Show abstract
Emerging evidence links the gut microbiome to sleep quality, yet measuring sleep at scale remains challenging. Commercial wearables, such as Fitbit, capture objective sleep and activity data in naturalistic settings. We integrated Fitbit data from a large, deeply-phenotyped cohort with paired lifestyle and health questionnaires. Wearable-derived measures aligned well with self-reported sleep, activity, and happiness. We identified dozens of covariate-adjusted associations between Fitbit-derived sleep features, lifestyle factors, and multi-omic data. Among molecular feature sets, the gut microbiome showed the greatest number of associations with sleep quality: butyrate-producing genera were positively associated with sleep and amplified the benefits of physical activity. Oscillospira, in particular, was consistently associated with better sleep. In blood, insulin, omega-3, and cortisol correlated with poorer sleep, whereas lower alcohol intake and mineral supplements correlated with better sleep. These robust, covariate-adjusted findings advance mechanistic understanding of the gut-sleep axis and broader molecular and lifestyle determinants of sleep quality.
Wang, S.; Mapar, P.; Moldovan, N.; van der Pol, Y.; Safrastyan, A.; van Werkhoven, E.; Tantyo, N. A.; Snieder, B.; Do Brito Valente, A. F.; de Jong, A. V.; Dinmohamed, A.; Drees, E. E. E.; Roemer, M. G. M.; Ylstra, B.; Klerk, C. P. W.; Strobbe, L.; Sandberg, Y.; Boersma, R. S.; Koene, H.; Pruijt, H.; de Heer, K.; van Rijn, R.; Bilgin, Y. M.; de Jongh, E.; Nijland, M.; van der Poel, M.; Koster, A.; Nieuwenhuizen, L.; Fijnheer, R.; Beeker, A.; Mous, R.; Vergote, V. K. J.; Vermaat, J. S. P.; Pegtel, D. M.; Chamuleau, M. E. D.; Mouliere, F.
Show abstract
Curative-intent immunochemotherapy fails in ~30% of patients with large B-cell lymphoma (LBCL), yet no validated molecular tool enables early identification of high-risk individuals to guide treatment intensification. Using shallow whole genome sequencing (sWGS) of plasma cell-free DNA from 190 LBCL patients, we developed and validated the ACT score (Aberrations, fragment Composition, Terminal motifs), a composite classifier integrating genomic and fragmentomic features from a single post-cycle-1 sample. ACT-positive patients had worse 2-year outcomes versus ACT-negative patients: time-to-progression 29% vs. 83% (HR 4.4, 95% CI 1.9 - 10.0; P = 1.5 x 10 - 4) and overall survival 47% vs. 93% (HR 8.7, 95% CI 3.0 - 25.4; P = 1.8 x 10-6). ACT score was independently prognostic of the International Prognostic Index, and their combination identified the highest-risk patients. Unlike mutation-based approaches, this assay requires neither tumor tissue, germline control nor a baseline plasma sample. Built on open-source tools and sWGS, the ACT score offers a feasible scalable strategy for early risk stratification in aggressive LBCL.
Berger, C. G.; Puttfarcken, B.; Qiu, J.; Hauer, I.; Herr, S.; Juestel, D.; Pleitez, M. A.
Show abstract
We present a compact pump-and-probe mid-infrared Optothermal Spectrometer (OTHES) equipped with Spatial Probing and Autocorrection (SPAC) optimized for robust intravital application in humans. SPAC-OTHES facilitates alignment stability and spectral comparability across different measurement sessions involving different skin types. Contrary to state-of-the-art, SPAC-OTHES uses camera-based beam detection and an auto-calibration mechanism that enables ca. 73% better spectral reproducibility in intravital measurements in human volunteers than non-calibrated readouts. Moreover, SPAC-OTHES has the potential to lower the glucose quantification error, as demonstrated here in artificial skin phantoms, where an improvement of 52% compared to conventional diode-based detection was observed. The compactness of OTHES, combined with reliable SPAC-readout, has the potential to accelerate commercialization and broad application of biosensors based on mid-infrared spectroscopy.
Ofordile, O. N.
Show abstract
Using a longitudinal cohort of 633 Gambian children (IHAT-GUT, NCT02941081), we resolve two mechanistically distinct ecological pathways linking Prevotella stercorea to infection risk. Its abundance positively predicts gut microbiome richness, consistent with community-level colonisation resistance for enteric outcomes. However, its association with reduced acute respiratory infection (ARI) persists unchanged after richness adjustment, identifying a species-autonomous pathway independent of community diversity. Weight-for-age z-score (WAZ) is uncorrelated with microbiome richness within strata, supporting WAZ as a proxy for host immune-metabolic reserve rather than a determinant of microbiome composition. In Low-WAZ children, P. stercorea at Day 1 associates with suppressed CRP, whereas in higher-WAZ children, elevated Day 1 inflammation predicts subsequent P. stercorea colonisation at Day 85, consistent with host-context-dependent immune selection. ARI and fever protection is richness-independent and concentrated in Low-WAZ children. P. copri does not retain an independent protective association when modelled jointly. These findings have direct implications for microbiome-directed interventions.
Napier, A.; Wiley, J.; Heslin, M.
Show abstract
A closed-loop quality system deployed across thirteen US hospital sites resolved physician complaints with zero regressions on 42 tracked cases across 1,089 optimization iterations, while a deterministic assembly-agent replacement cut H+P trace latency from 19.6 s to 10.8 s (-8.8 s, 95% CI [-10.5, -7.1] s; n = 100 pre, n = 100 post). We report four observations and an architectural follow-through. First, the same binary-check instrument produces opposite outcomes depending on the question asked: "maximize this score" produces structurally-correct notes that physicians reject (Spearman rho = -0.077, 95% CI [-0.40, 0.26], n = 36); "did this specific fabrication stop?" produces rater-invariant deployment decisions. Second, in our pipeline, assembly-stage agents did not respond to prompt optimization the way reasoning agents did: four consecutive optimization attempts produced 18-28 point regressions. Third, physician preference is rater-fragile at typical clinical-AI calibration sample sizes (Cohen's kappa = 0.028 between two board-certified physicians, 95% CI [-0.30, 0.36] on n = 35 overlapping pairs). Fourth, the architectural punchline: six weeks after the prediction, the LLM call at the chart-assembly step was replaced with a deterministic renderer (sub-500-character template plus sandboxed scripting), lifting the defect-free rate on a 51-case holdout from 49% to 84%. We introduce a Pareto-with-absolute-floors acceptance rule (multi-axis commit with severity-class categorical vetoes) as a methodological contribution distinct from scalar-reward acceptance in standard prompt-optimization frameworks. Cross-iteration rejection memory prevents the loop from re-proposing edits already rejected three or more times. A reproducibility bundle (anonymized ablation per-case counts, bootstrap-CI data, analysis scripts) is released under CC BY 4.0 at github.com/sayvant/SQS-Auditor-paper-data.
Lu, S.; Ruan, X.; Wang, L.; Wang, X.; Sameer, M.; Liu, H.
Show abstract
Although GLP1/GIP receptor agonists demonstrate unprecedented weight loss efficacy, their rapid clinical adoption has revealed significant real-world tolerability challenges. To evaluate their dynamic safety profiles, we developed a macro to micro pharmacovigilance framework by combining global FAERS reports with local UT Physician EHR. Macroscopically, we distilled 17 shared adverse events across the drug class from FAERS with disproportionality analysis. Microscopically, local EHR data (289,655 longitudinal treatment sessions across 71,316 patients) revealed 51.6% of GLP1 sessions terminated within 90 days. Furthermore, temporal stratified logistic regression demonstrated that initial exposure (0 to 30 days) correlated strongly with nausea and vomiting, which attenuated in extended sessions, whereas extended exposure (>2 years) uncovered late onset risks, notably incident hepatic steatosis. Ultimately, this time aware framework reveals that GLP1 safety profiles are profoundly duration dependent, providing critical insights into both acute intolerances and long-term medication safety.
Rodriguez, X.; Perez-Jimenez, J. G.; Alexander, L. W.; Lezcano-Coba, C.; Galue, J.; Juarez, Y.; Beltran, D.; Smith, D. R.; Kadir, M.; Ali, D. W.; Corrales, R.; Trujillo Rodriguez, L.; Valdiviezo, G. E.; Thomas, Q. K.; Cicalo, A.; Fitzpatrick, M. C.; Luquette, A. E.; Cameron Sayer, L.; Cer, R. Z.; Malagon, F.; Grajales, I. A.; Rivera, L. F.; Gonzalez-R, Z.; Antioco, J.; Walters-Valdes, E.; Meneghello-Ponce, N.; Vittor, A. Y.; Escobar-Lee, K.; Abouganem-Shaw, A.; Rodriguez, F.; Aguirre, E.; Loyola, S.; Tinoco, Y.; Moreno, B.; Chen-German, M.; Ampuero, S.; Gomez-Angelo, A.; Correa-Duarte, S.; Ace
Show abstract
Oropouche virus (OROV) spread across the Americas in 2024, yet Panama Darien migration corridor saw no outbreak until nearly a year after Brazil January 2024 peak, raising two hypotheses: cryptic circulation masked by diagnostic gaps, or recent introduction under permissive climatic conditions. Here we resolve this paradox using integrated clinical, genomic, and climate-informed surveillance. Among 1,040 individuals tested, 43% were OROV-positive and showed a clinical signature distinct from co-circulating arboviruses, including headache more frequent than in dengue (RR 2.38, 95% CI 1.74-3.24). The household secondary attack rate was 56%, and waste burning independently predicted infection. Phylogeographic reconstruction identified a single recent introduction in October 2024 with no evidence of adaptive evolution, excluding prolonged cryptic persistence. Climate-informed models indicate broad outbreak susceptibility across Panama, with Bocas del Toro and Los Santos as the next highest-risk provinces. These findings identify a Central American foothold for OROV with potential for further northward spread.
Bazemore, K.; Iqbal, T.; Kuzma, A. B.; Grant, S. F. A.; Schellenberg, G. D.; Wang, L.-S.; Chesi, A.; Jin, J.; Naj, A. C.
Show abstract
Pathway-specific polygenic risk scores (pathway-PRS) measure aggregate genetic risk across single nucleotide variants (SNVs) annotated to genes in a pathway of interest. In most applications, SNV-to-gene annotation is based on SNV position with respect to gene boundaries. This approach is ill-suited for incorporating non-coding SNVs, which can regulate gene expression over long distances and represent a large proportion of risk variants for Alzheimer's disease (AD). Here, we compare the performance of AD pathway-PRS across SNV-to-gene annotation strategies that integrate varying levels of functional genomic data, including adult brain chromatin interaction and expression quantitative trait loci (eQTL) data. In the UK Biobank (n=328,526), including AD cases defined by ICD-9/10 codes (n=3,043) and by family history of AD/dementia (n=38,589), we show that the annotation strategy integrating chromatin interaction and eQTL data consistently improves pathway-PRS performance. We replicate this finding in independent data from the Alzheimer's Disease Genetics Consortium (n=3,370). We further find that pathway-PRS associations with AD vary by annotation strategy and that power to detect sex-dependent and age-at-onset associations is increased with integrative annotation. Together, these findings support the use of functionally informed SNV-to-gene annotation for pathway-PRS construction and highlight the importance of applying multiple annotation strategies for robust inference.
Yang, K.; Shi, P.; Huang, H.; Musio, F.; Baazaoui, H.; Aydin, O. U.; Hilbert, A.; Hamadache, R. E.; Yalcin, C.; Zhang, M.; Falcetta, D.; de la Rosa, E.; Shit, S.; Prabhakar, C.; Wittmann, B.; Rokuss, M. R.; Kirchhoff, Y.; Al-Maskari, R.; Hoeher, L.; Juchler, N.; Casamitjana, A.; Cleary, J.; Schmick, A.; Baumgartner, P.; Deseoe, J.; Vandans, O.; Lee, D.; Oh, K.; LaBella, D.; Mazher, M.; Niederer, S. A.; Qayyum, A.; Liu, Y.; Chen, J.; Kim, W.; Asawalertsak, N.; Kim, M.; Shin, D.; Park, S.-H.; Kikuchi, S.; Zhang, Y.; Liu, J.; Cui, Y.; Qiu, Y.; Verschuur, A.; Zhang, J.; van der Schaaf, I.; Su, R.;
Show abstract
We present the TopBrain 2025 Challenge, the first benchmark for fine-grained multiclass segmentation of the whole brain vasculature in both computed tomography angiography (CTA) and magnetic resonance angiography (MRA). Building on the TopCoW challenge, TopBrain scales vessel annotation from the Circle of Willis to the entire brain, introducing a dataset of 90 annotated volumes across 48 landmark vessel classes spanning arterial and venous systems, of which 50 training volumes are publicly released. Vessel definitions were consolidated from established neuroanatomical references into a unified annotation scheme, and vessel caliber measurements along the centerline are reported for the first time across the whole brain vascular anatomy. To address the unique challenges of multiclass brain vessel segmentation, we propose an evaluation framework that accounts for detection in segmentation performance, assesses anatomical plausibility, and introduces novel contamination metrics that characterize inter-class prediction errors. Fifteen teams from over 220 registered participants submitted algorithms to the benchmark. The top-performing teams built on nnUNet with principled system design choices, achieving around 80% Dice scores, near-zero invalid neighbor counts, over 60% F1 scores for side-road vessels, and below 18% foreground contamination ratio. Larger vessels are easier to segment, while smaller and more complex vessels remain the true bottleneck. The annotated datasets and podium-finish algorithms are made publicly available on Zenodo.
Mosquera, J. V.; Tang, I.; Murach, M.; Auguste, G.; Kodali, A.; Hart, P.; Shaw, D. M.; Li, M.; Turner, A. W.; Hodonsky, C. J.; Dworak, N. M.; de Oliveira, A. K.; Sol-Church, K.; Jhee, T.; van der Sijs, K. I. M.; Adkar, S. S.; Choi, R. B.; Vacante, F.; Wu, J. C.; Cheng, P.; Giannarelli, C.; Leeper, N. J.; Finn, A. V.; Bjorkegren, J. L. M.; Kovacic, J. C.; Yurdagul, A.; van der Laan, S. W.; Miller, C. L.
Show abstract
Advances in single-cell and spatial assays have revolutionized the scale and resolution of molecular tissue profiling. Here we present MetaPlaq, a multimodal atlas of human atherosclerotic arterial beds comprising over a million cells across single-cell transcriptomics, epigenomics and high-resolution spatial expression assays. We map granular cell states and disease-relevant transcriptional programs within the native tissue context of coronary arteries. Furthermore, we map cardiovascular GWAS signals to smooth muscle cells (SMCs) and endothelial cells (ECs) and uncover the cis-regulatory architecture governing their phenotypic transitions. Our comprehensive epigenomic reference allowed us to build cell-specific enhancer-gene link maps and multimodal gene regulatory networks (GRNs) underlying disease-relevant states such as osteogenic SMCs and ECs undergoing mesenchymal transition. We also integrate SMC and EC disease-associated gene sets with GRNs to nominate key transcription factors such as PRRX1, BNC2 and ELK3 regulating atherosclerosis-relevant transcriptional programs. Finally, we layer single-cell and spatial modalities to fine-map GWAS variants with improved cell and anatomical context. We highlight candidate cell-specific regulatory mechanisms at less characterized CAD loci, including FGD5 and MCF2L in ECs. Together, this atlas represents an important step towards fully interpreting genetic risk loci and informing new therapeutic strategies for cardiovascular disease.
Casalino-Matsuda, S. M.; Guggilla, V.; Gao, C. A.; Demeulenaere, K. E.; Cusick, L. P.; Fenske, S. W.; Yu, Z.; Lu, Z.; Swaminathan, S.; Grant, R. A.; Schleck, M. J.; Prakriya, M.; Hebbar, S.; Stauderman, K.; Donnelly, H. K.; Pickens, C.; Morales-Nebreda, L.; The NU SCRIPT Study Investigators, ; Wunderink, R. G.; Misharin, A. V.; Singer, B. D.; Budinger, G. S.
Show abstract
Viral pneumonia is perpetuated by inflammatory circuits between activated T cells and monocyte-derived alveolar macrophages (MoAM). T cells and macrophages express ORAI1 and STIM1, which form calcium release-activated calcium (CRAC) channels that allow extracellular calcium entry in response to endoplasmic reticulum calcium store depletion. In a randomized, placebo-controlled, multicenter phase 2 trial (CARDEA), Auxora, a CRAC channel inhibitor, reduced all-cause 30-day mortality by 56% in patients with severe SARS-CoV-2 pneumonia. Here, we report a multi-omics analysis of serially collected alveolar samples from unvaccinated patients with severe SARS-CoV-2 pneumonia treated with Auxora versus placebo. We found reductions in plasma levels of the monocyte- and T cell-chemokines, CCL8 and PDGF-AA. Using peripheral blood mononuclear cells (PBMC) from healthy volunteers, we show that Auxora directly targets T cells to inhibit the transcription of CCL8 and PDGFA in monocyte-derived macrophages, supporting a mechanism for its effects and a potential intermediate biomarker of efficacy.
Elemento, O.; Sigaras, A.; Colonel, J.; Hajirasouliha, I.; Ghosh, S.; Bensoussan, Y.; Bridge2AI-Voice Consortium, ; Rameau, A.
Show abstract
Vocal biomarkers, encompassing voice and speech, have largely been developed for individual conditions in isolation, limiting their generalizability across diseases and recording settings. To address this, we introduce VoiceFM, a contrastive model that learns general-purpose clinical voice representations by aligning audio embeddings with rich clinical metadata. Using the Bridge2AI-Voice dataset (984 primarily English-speaking adult participants, 846 used for training and 138 held out as a temporally separated validation cohort, 40,056 recordings totaling 176 hours across 5 academic medical centers), VoiceFM pairs a fine-tuned Whisper large-v2 encoder with a tabular transformer over 44 clinical features via symmetric InfoNCE loss. Linear probes on frozen VoiceFM embeddings achieve mean AUROC 0.952 +/- 0.005 across five evaluation tasks (control vs disease screening plus four disease categories), significantly outperforming Frozen Whisper (0.926 +/- 0.013, p = 0.013), Frozen HuBERT (0.885 +/- 0.017, p = 0.0009), and the contrastively trained VoiceFM-HuBERT (0.938 +/- 0.006, p = 0.012). On the 138-participant held-out cohort, VoiceFM-Whisper achieves AUROCs of 0.99 for Alzheimer's/dementia/MCI and 0.89 for airway stenosis, demonstrating that the learned representations generalize to participants the model has never seen. VoiceFM representations transfer to three external datasets without retraining and improve few-shot classification. Recording task attribution identifies a small set of speech tasks that match or exceed the full battery's performance, suggesting shorter screening protocols are feasible. Trained predominantly on English audio, VoiceFM transfers without fine-tuning to Spanish-language Parkinson's disease (PD) detection (NeuroVoz, 107 participants, AUROC 0.93 +/- 0.02), with the signal dominated by articulatory rather than phonatory features. A fine-tuned classifier achieves participant-level AUROC 0.87 (sustained 0.85, countdown 0.80) on the mPower smartphone study (585 held-out participants). Together, these results show that contrastive alignment between voice and rich clinical metadata can serve as the basis for a clinical voice foundation model, producing a single set of transferable representations that generalize across diseases, languages, recording conditions, and patients enrolled after model freeze.
Kline, M. C.; Helekal, D.; Oliveira Roster, K. I.; Grad, Y.
Show abstract
The dynamics of sexually transmitted infections involve interconnected transmission networks, including men who have sex with men and heterosexual populations. Understanding the extent of bridging between these networks can inform surveillance, guide interventions, and aid in the interpretation of their impact, but methods for quantifying bridging have been lacking. Here, we addressed whether pathogen genomics tools, successfully used to reconstruct transmission in other contexts, could accurately infer sexual network bridging. Based on simulations of gonorrhea spread, we evaluated phylodynamic bridging metrics inferred by ancestral state reconstruction under a range of sampling schemes, from comprehensive to sparse. These metrics differentiated sexual network structures even with biased sampling schemes, but accuracy depended on the sampling scheme and density: phylodynamic bridging estimates using sequences from all detected infections for one network configuration were on average 6.9% above the true value, whereas estimates from 5% of infections in symptomatic men with many partners were on average >1000% above the true value. These results suggest routine overestimation of bridging from unadjusted inferences from genomics data and provide context for interpreting existing genomic surveillance data and targeted studies.
Sharma, R.; Hu, F.; Li, X.; Campos, R.; Kundu, K.; Atanur, S.; Karpinski, M.; Wasilewski, S.; MacArthur, S.; Vitsios, D.; Dhindsa, R. S.; Georgakopoulos-Soares, I.; Burren, O. S.; Petrovski, S.; Mustoe, A. M.; Wang, Q.; Glodzik, D.; Zou, X. Z.
Show abstract
Non-coding variants are important contributors to human traits and diseases but linking them to molecular mechanisms and phenotypes at scale remains challenging. G-quadruplexes (G4s) are four-stranded structures formed by guanine-rich sequences and have emerged as key functional elements within the non-coding genome. G4s are enriched in regulatory regions and can modulate gene expression at both the DNA and RNA levels, influencing transcription, replication, and RNA processing, positioning them as key mediators linking non-coding variation to complex biological traits. Here, we profile putative G4s across five regulatory regions in 459,449 UK Biobank genomes and perform phenome-wide association analyses spanning 2,941 plasma protein abundances, 13,321 binary traits, and 1,682 quantitative traits. We show that putative G4-modifying variants are depleted under purifying selection despite elevated local mutability and drive large, bidirectional associations with plasma proteins and clinical traits, including associations not captured by coding variants. Using a mechanism-aware collapsing strategy that groups rare non-coding variants by their predicted impact on G4 stability, we achieved stronger gene-level signals than those obtained with standard rare-variant collapsing approaches. Integrating non-coding and protein-truncating variants (PTVs) increases discovery power, revealing 843 significant associations missed by the PTV-only model. Replication in the Alliance for Genomic Discovery cohort demonstrates cross-cohort robustness. Our study suggests G4s as widespread mediators of non-coding regulation and provides a framework for mechanism-informed target discovery and prioritization across the non-coding genome.
Minoccheri, C.; Joo, P.; Hu, X.-S.; Affendi, H.; Elayyan, F.; Harville, A.; McDonald, N. J.; Botero, T.; DaSilva, A. F.
Show abstract
Neuroimaging based pain decoding faces two underappreciated challenges: between subject variability that prevents classifiers from generalizing across patients, and within session cross validation designs that inflate reported accuracy by conflating within person and between person variance. Here we address both using portable functional near infrared spectroscopy (fNIRS) during pharmacologically verified local nerve anesthesia. Twentyfive patients with clinically painful teeth underwent 36 channel bilateral fNIRS during percussion before ("Pre") and after ("Post") local nerve anesthesia. In 13 block-success patients, a paired Pre versus Post comparison with healthy tooth control identified three temporal hemodynamic response function (HRF) features (late slope, mean first derivative, and baseline normalized amplitude) whose analgesia interaction effects (d = 0.63 to 0.79) exceeded that of raw general linear model (GLM) amplitude (d = 0.56), with a significant difference-in-differences interaction (p = 0.011). Per-patient calibration with these features yielded leave one subject out (LOSO) AUC = 0.68 to 0.76 for nonlinear classifiers (permutation p = 0.002), with HbO-specific feature selection achieving the best performance (RF AUC = 0.760); a healthy tooth negative control was non-significant. End to end deep learning on raw time series (CNN LSTM AUC = 0.719) was competitive with feature based classifiers, while linear models did not reach significance. Critically, head to head comparison of within-session CV and LOSO on the same data revealed mean inflation of +0.13 AUC across all model types, including deep learning, demonstrating that high within session accuracy alone does not establish subject-independent validity. Exploratory analyses suggested complementary roles for oxyhemoglobin (HbO; within patient analgesia detection) and deoxyhemoglobin (HbR; cross patient information), and that trial to trial response variability may complement amplitude for cross patient pain detection. These results show that per patient calibration with temporal HRF features supports subject independent analgesic-state detection under strict LOSO evaluation, and that within-session validation (standard in the fNIRS pain- decoding literature) can substantially overestimate performance.